Efficient CPU-GPU cooperative computing for solving the subset-sum problem
نویسندگان
چکیده
Heterogeneous CPU-GPU system is a powerful way to accelerate compute-intensive applications, such as the subset-sum problem. Many parallel algorithms for solving the problem have been implemented on graphics processing units (GPUs). However, these GPU implementations may fail to fully utilize all the CPU cores and the GPU resources. When the GPU performs computational task, only one CPU core is used to control the GPUs, and all the remaining CPU cores are in idle state, which leads to large amounts of available CPU resources being wasted. This paper proposes an efficient CPU-GPU cooperative computing scheme for solving the subset-sum problem, which enables the full utilization of all the computing power of both CPUs and GPUs. In order to find the most appropriate task distribution ratio between CPUs and GPUs, this paper establishes a simple but effective task distribution model. Considering the high CPU-GPU communication overhead and the unbalanced workload between CPUs and GPUs may greatly reduce the performance, an incremental data transfer method is proposed to reduce the CPU-GPU communication overhead, and a feedback-based dynamic task distribution scheme is designed to effectively balance the workload between CPUs and GPUs during runtime. The experimental results show that the CPU-GPU cooperative computing achieves a significant performance benefit over the CPU-only or GPU-only computing. Copyright © 2015 John Wiley & Sons, Ltd.
منابع مشابه
A novel cooperative accelerated parallel two-list algorithm for solving the subset-sum problem on a hybrid CPU-GPU cluster
Many parallel algorithms have recently been developed to accelerate solving the subset-sum problem on a heterogeneous CPU–GPU system. However, within each compute node, only one CPU core is used to control one GPU and all the remaining CPU cores are in idle state, which leads to a large number of CPU cores being wasted. In this paper, based on a cost-optimal parallel two-list algorithm, we prop...
متن کاملGPU implementation of a parallel two-list algorithm for the subset-sum problem
The subset-sum problem is a well-known non-deterministic polynomial-time complete (NP-complete) decision problem. This paper proposes a novel and efficient implementation of a parallel two-list algorithm for solving the problem on a graphics processing unit (GPU) using Compute Unified Device Architecture (CUDA). The algorithm is composed of a generation stage, a pruning stage, and a search stag...
متن کاملMulti-GPU and Multi-CPU Parallelization for Interactive Physics Simulations
Today, it is possible to associate multiple CPUs and multiple GPUs in a single shared memory architecture. Using these resources efficiently in a seamless way is a challenging issue. In this paper, we propose a parallelization scheme for dynamically balancing work load between multiple CPUs and GPUs. Most tasks have a CPU and GPU implementation, so they can be executed on any processing unit. W...
متن کاملCooperative CPU-GPU Frequency Capping (Co-Cap) for Energy Efficient Mobile Gaming
Mobile platforms are increasingly using Heterogeneous MultiProcessor Systems-on-Chip (HMPSoCs) with differentiated processing cores and GPUs to achieve high performance for graphics-intensive applications such as mobile games. Traditionally, separate CPU and GPU governors are deployed in order to achieve energy efficiency through Dynamic Voltage Frequency Scaling (DVFS), but miss opportunities ...
متن کاملAn OpenMP Programming Toolkit for Hybrid CPU/GPU Clusters Based on Software Unified Memory
Recently, hybrid CPU/GPU cluster has drawn much attention from the researchers of high performance computing because of amazing energy efficiency and adaptable resource exploitation. However, the programming of hybrid CPU/GPU clusters is very complex because it requires users to learn new programming interfaces such as CUDA and OpenCL, and combine them with MPI and OpenMP. To address this probl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Concurrency and Computation: Practice and Experience
دوره 28 شماره
صفحات -
تاریخ انتشار 2016